Multilingual Code Snippets Training for Program Translation

نویسندگان

چکیده

Program translation aims to translate source code from one programming language another. It is particularly useful in applications such as multiple-platform adaptation and legacy migration. Traditional rule-based program methods usually rely on meticulous manual rule-crafting, which costly both terms of time effort. Recently, neural network based have been developed address this problem. However, the absence high-quality parallel data main bottlenecks impedes development models. In paper, we introduce CoST, a new multilingual Code Snippet Translation dataset that contains 7 commonly used languages. The at level snippets, provides much more fine-grained alignments between different languages than existing datasets. We also propose model leverages snippet denoising auto-encoding Multilingual (MuST) pre-training. Extensive experiments show training effective improving performance, especially for low-resource Moreover, our method shows good generalizability consistently improves performance number baseline proposed outperforms baselines snippet-level program-level translation, achieves state-of-the-art CodeXGLUE task. code, data, appendix paper can be found https://github.com/reddy-lab-code-research/MuST-CoST.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive Synthesis of Code Snippets

We describe a tool that applies theorem proving technology to synthesize code fragments that use given library functions. To determine candidate code fragments, our approach takes into account polymorphic type constraints as well as test cases. Our tool interactively displays a ranked list of suggested code fragments that are appropriate for the current program point. We have found our system t...

متن کامل

A Multilingual Framework for Searching Definitions on Web Snippets

This work presents Mdef-WQA, a system that searches for answers to definition questions in several languages on web snippets. For this purpose, Mdef-WQA biases the search engine in favour of some syntactic structures that often convey definitions. Once descriptive sentences are identified, Mdef-WQA clusters them by potential senses and presents the most relevant phrases of each potential sense ...

متن کامل

Dynamic and Interactive Synthesis of Code Snippets

Dynamic and Interactive Synthesis of Code Snippets

متن کامل

Tool for Fast Detection of Java Code Snippets

This paper presents general results on the Java source code snippet detection problem. We propose the tool which uses graph and subgraph isomorphism detection. A number of solutions for all of these tasks have been proposed in the literature. However, although that all these solutions are really fast, they compare just the constant static trees. Our solution offers to enter an input sample dyna...

متن کامل

Applications in Multilingual Machine Translation Applications in Multilingual Machine Translation

The CAT2 Machine Translation System, developed in Saarbr ucken in 1987, is a natural language application coded entirely in Prolog. Since its initial development, several languages have been implemented on an experimental basis to evaluate the translation methodology, the underlying formalism, the linguistic descriptions, and the e ectiveness of the Prolog implementation. Seven years later, it...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i10.21434